Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech
نویسندگان
چکیده
Speech synthesis and voice conversion techniques can pose threats to current speaker verification (SV) systems. For this purpose, it is essential to develop front end systems that are able to distinguish human speech vs. spoofed speech (synthesized or voice converted). In this paper, for the ASVspoof 2015 challenge, we propose a detector based on combination of cochlear filter cepstral coefficients (CFCC) and change in instantaneous frequency (IF), (i.e., CFCCIF) to detect natural vs. spoofed speech. The CFCCIF features were extracted at frame-level and Gaussian mixture model (GMM)based classification system was used. On the development set, the proposed features (i.e., CFCCIF) after fusion with Mel frequency cepstral coefficients (MFCC) features achieved an EER of 1.52 %, which is a significant reduction from MFCC (3.26 %) and CFCCIF (2.29 %) alone using 12-D static features. The EER further decreases to 0.89 % and 0.83 % for delta and delta-delta features, respectively. Experimental results on evaluation set show that fusion of MFCC and CFCCIF works relatively well with an EER of 0.41 % for known attacks and 2.013 % EER for unknown attacks. On an average, fusion of MFCC and CFCCIF features provided relatively best EER of 1.211 % for the challenge.
منابع مشابه
Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features
This work describes the techniques used for spoofed speech detection for the ASVspoof 2017 challenge. The main focus of this work is on exploiting the differences in the speech-specific nature of genuine speech signals and spoofed speech signals generated by replay attacks. This is achieved using glottal closure instants, epoch strength, and the peak to side lobe ratio of the Hilbert envelope o...
متن کاملAllpass modelling of Fourier phase for speaker verification
This paper proposes features based on parametric representation of Fourier phase of speech for speaker verification. Direct computation of Fourier phase suffers from phase wrapping and hence we attempt parametric modelling of phase spectrum using an allpass (AP) filter. The coefficients of the AP filter are estimated by minimizing an entropy based objective function motivated from speech produc...
متن کاملSpoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks
This paper describes the application of deep neural networks (DNN), trained to discriminate between human and spoofed speech signals, to improve the performance of spoofing detection. In this work we use amplitude, phase, linear prediction residual, and combined amplitude phase-based acoustic level features. First we train a DNN on the spoofing challenge training data to discriminate between hu...
متن کاملRelative phase information for detecting human speech and spoofed speech
The detection of human and spoofed (synthetic/converted) speech has started to receive more attention. In this study, relative phase information extracted from a Fourier spectrum is used to detect human and spoofed speech. Because original/natural phase information is almost entirely lost in spoofed speech using current synthesis/conversion techniques, a modified group delay based feature, the ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کامل